{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Amazon SageMaker Lineage\n",
    "Amazon SageMaker Lineage enables events that happen within SageMaker to be traced via a graph structure.  The data simplifies generating reports, making comparisons, or discovering relationships between events.  For example easily trace both how a model was generated and where the model was deployed. \n",
    "\n",
    "The lineage graph is created automatically by SageMaker and you can directly create or modify your own graphs.\n",
    "\n",
    "\n",
    "## Key Concepts\n",
    "\n",
    "* **Lineage Graph** - A connected graph tracing your machine learning workflow end to end. \n",
    "* **Artifacts** - Represents a URI addressable object or data.  Artifacts are typically inputs or outputs to Actions.  \n",
    "* **Actions**  - Represents an action taken such as a computation, transformation, or job.  \n",
    "* **Contexts** - Provides a method to logically group other entities.\n",
    "* **Associations** - A directed edge in the lineage graph that links two entities.\n",
    "* **Lineage Traversal** - Starting from an arbitrary point trace the lineage graph to discover and analyze relationships between steps in your workflow.\n",
    "* **Experiments** - Experiment entites (Experiments, Trials, and Trial Components) are also part of the lineage graph and can be associated wtih Artifacts, Actions, or Contexts.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook Overview\n",
    "\n",
    "This notebook demonstrates how to:\n",
    "* Understand the basics of lineage entities.\n",
    "* Create and associate lineage entities to track your workflow.\n",
    "* Traverse the associations between lineage entities."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "Select the `Python 3 (Data Science)` kernel in SageMaker Studio."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from botocore.exceptions import ClientError\n",
    "\n",
    "import os\n",
    "import sagemaker\n",
    "import logging\n",
    "import boto3\n",
    "import sagemaker\n",
    "import pandas as pd\n",
    "\n",
    "sess   = sagemaker.Session()\n",
    "bucket = sess.default_bucket()\n",
    "role = sagemaker.get_execution_role()\n",
    "region = boto3.Session().region_name\n",
    "\n",
    "sm = boto3.Session().client(service_name='sagemaker', region_name=region)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "from sagemaker.lineage.context import Context\n",
    "from sagemaker.lineage.action import Action\n",
    "from sagemaker.lineage.association import Association\n",
    "from sagemaker.lineage.artifact import Artifact"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.lineage.visualizer import LineageTableVisualizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "unique_id = str(int(datetime.now().replace(microsecond=0).timestamp()))\n",
    "\n",
    "print(f'Unique id is {unique_id}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an example context\n",
    "\n",
    "# the name must be unique across all other contexts\n",
    "context_name = f'machine-learning-workflow-{unique_id}' \n",
    "\n",
    "ml_workflow_context = Context.create(\n",
    "    context_name=context_name, \n",
    "    context_type='MLWorkflow',    \n",
    "    source_uri=unique_id,\n",
    "    # properties services as a method to store metdata on lineage entities in additional to Tags\n",
    "    properties={\"example\": \"true\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "machine-learning-workflow-1609278631\n"
     ]
    }
   ],
   "source": [
    "# list all the contexts\n",
    "\n",
    "contexts = Context.list(sort_by='CreationTime', sort_order='Descending')\n",
    "\n",
    "for ctx in contexts:\n",
    "    print(ctx.context_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an example action and associate it with the context\n",
    "\n",
    "model_build_action = Action.create(\n",
    "    action_name=f\"model-build-step-{unique_id}\",\n",
    "    action_type=\"ModelBuild\",\n",
    "    source_uri=unique_id,\n",
    "    properties={\"Example\": \"Metadata\"},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Association Type can be Produced|DerivedFrom|AssociatedWith|ContributedTo\n",
    "context_action_association = Association.create(\n",
    "    source_arn=ml_workflow_context.context_arn,\n",
    "    destination_arn=model_build_action.action_arn,\n",
    "    association_type='AssociatedWith'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "model-build-step-1609278631 has an incoming association from machine-learning-workflow-1609278631\n",
      "machine-learning-workflow-1609278631 has an outgoing association to model-build-step-1609278631\n"
     ]
    }
   ],
   "source": [
    "# now the Action and Context are associated:\n",
    "incoming_associations_to_action = Association.list(destination_arn=model_build_action.action_arn)\n",
    "for association in incoming_associations_to_action:\n",
    "    print(f'{model_build_action.action_name} has an incoming association from {association.source_name}')\n",
    "\n",
    "outgoing_associations_from_context = Association.list(source_arn=ml_workflow_context.context_arn)\n",
    "for association in outgoing_associations_from_context:\n",
    "    print(f'{ml_workflow_context.context_name} has an outgoing association to {association.destination_name}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an artifact representing inputs to the model building action\n",
    "input_test_images = Artifact.create(\n",
    "    artifact_name='mnist-test-images',\n",
    "    artifact_type='TestData',\n",
    "    source_types=[{\"SourceIdType\": \"Custom\", \"Value\": unique_id}],\n",
    "    source_uri='https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz')\n",
    "\n",
    "input_test_labels = Artifact.create(\n",
    "    artifact_name='mnist-test-labels',\n",
    "    artifact_type='TestLabels',\n",
    "    source_types=[{\"SourceIdType\": \"Custom\", \"Value\": unique_id}],\n",
    "    source_uri='https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an artifact representing a trained model\n",
    "output_model = Artifact.create(\n",
    "    artifact_name='mnist-model',\n",
    "    artifact_type='Model',\n",
    "    source_types=[{\"SourceIdType\": \"Custom\", \"Value\": unique_id}],\n",
    "    source_uri='s3://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/model/tensorflow-training-2020-11-20-23-57-13-077/model.tar.gz'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Association(sagemaker_session=<sagemaker.session.Session object at 0x7f5bdfa67d10>,source_arn='arn:aws:sagemaker:us-east-1:231218423789:artifact/8eb66641d3961643ec91e3e7e3f49b28',destination_arn='arn:aws:sagemaker:us-east-1:231218423789:action/model-build-step-1609278631',association_type=None,response_metadata={'RequestId': '94b3957d-fe88-4a33-8a90-16c55444c408', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '94b3957d-fe88-4a33-8a90-16c55444c408', 'content-type': 'application/x-amz-json-1.1', 'content-length': '193', 'date': 'Tue, 29 Dec 2020 21:50:51 GMT'}, 'RetryAttempts': 0})"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# associate the data set artifact with an incoming association to the example action\n",
    "Association.create(source_arn=input_test_images.artifact_arn, destination_arn=model_build_action.action_arn)\n",
    "Association.create(source_arn=input_test_labels.artifact_arn, destination_arn=model_build_action.action_arn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Association(sagemaker_session=<sagemaker.session.Session object at 0x7f5bde901590>,source_arn='arn:aws:sagemaker:us-east-1:231218423789:action/model-build-step-1609278631',destination_arn='arn:aws:sagemaker:us-east-1:231218423789:artifact/7bf4d7516ddfd81e9acfec3e20f0f71a',association_type=None,response_metadata={'RequestId': 'ec9f26f6-7153-4547-be1e-3b3fe30fc022', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'ec9f26f6-7153-4547-be1e-3b3fe30fc022', 'content-type': 'application/x-amz-json-1.1', 'content-length': '193', 'date': 'Tue, 29 Dec 2020 21:50:52 GMT'}, 'RetryAttempts': 0})"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# associate the example action with an outgoing association to the model artifact\n",
    "Association.create(source_arn=model_build_action.action_arn, destination_arn=output_model.artifact_arn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# List All Artifacts With SageMaker ArtifactAnalytics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.analytics import ArtifactAnalytics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.analytics import ArtifactAnalytics\n",
    "analytics = ArtifactAnalytics()\n",
    "df = analytics.dataframe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ArtifactName</th>\n",
       "      <th>ArtifactArn</th>\n",
       "      <th>ArtifactType</th>\n",
       "      <th>ArtifactSourceUri</th>\n",
       "      <th>CreationTime</th>\n",
       "      <th>LastModifiedTime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>Model</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:model...</td>\n",
       "      <td>2020-12-22 17:43:43.494000+00:00</td>\n",
       "      <td>2020-12-22 19:37:07.535000+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>Model</td>\n",
       "      <td>s3://sagemaker-us-east-1-231218423789/BERT/out...</td>\n",
       "      <td>2020-12-22 17:43:40.963000+00:00</td>\n",
       "      <td>2020-12-22 17:43:40.963000+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>s3://sagemaker-us-east-1-231218423789/sagemake...</td>\n",
       "      <td>2020-12-22 17:29:01.770000+00:00</td>\n",
       "      <td>2020-12-22 17:29:01.770000+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>s3://sagemaker-us-east-1-231218423789/sagemake...</td>\n",
       "      <td>2020-12-22 17:29:01.720000+00:00</td>\n",
       "      <td>2020-12-22 17:29:01.720000+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>s3://sagemaker-us-east-1-231218423789/sagemake...</td>\n",
       "      <td>2020-12-22 17:29:01.656000+00:00</td>\n",
       "      <td>2020-12-22 17:29:01.656000+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>124</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>s3://sagemaker-us-east-1-231218423789/tensorfl...</td>\n",
       "      <td>2020-12-18 17:32:41.003000+00:00</td>\n",
       "      <td>2020-12-18 17:32:41.003000+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>125</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>s3://sagemaker-us-east-1-231218423789/tensorfl...</td>\n",
       "      <td>2020-12-18 17:32:40.988000+00:00</td>\n",
       "      <td>2020-12-18 17:32:40.988000+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>126</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>s3://sagemaker-us-east-1-231218423789/tensorfl...</td>\n",
       "      <td>2020-12-18 17:32:40.922000+00:00</td>\n",
       "      <td>2020-12-18 17:32:40.922000+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>Image</td>\n",
       "      <td>503895931360.dkr.ecr.us-east-1.amazonaws.com/s...</td>\n",
       "      <td>2020-12-18 17:32:40.877000+00:00</td>\n",
       "      <td>2020-12-18 17:32:40.877000+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128</th>\n",
       "      <td>None</td>\n",
       "      <td>arn:aws:sagemaker:us-east-1:231218423789:artif...</td>\n",
       "      <td>Image</td>\n",
       "      <td>763104351884.dkr.ecr.us-east-1.amazonaws.com/t...</td>\n",
       "      <td>2020-12-18 17:32:37.337000+00:00</td>\n",
       "      <td>2020-12-18 17:32:37.337000+00:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>129 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    ArtifactName                                        ArtifactArn  \\\n",
       "0           None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "1           None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "2           None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "3           None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "4           None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "..           ...                                                ...   \n",
       "124         None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "125         None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "126         None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "127         None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "128         None  arn:aws:sagemaker:us-east-1:231218423789:artif...   \n",
       "\n",
       "    ArtifactType                                  ArtifactSourceUri  \\\n",
       "0          Model  arn:aws:sagemaker:us-east-1:231218423789:model...   \n",
       "1          Model  s3://sagemaker-us-east-1-231218423789/BERT/out...   \n",
       "2        DataSet  s3://sagemaker-us-east-1-231218423789/sagemake...   \n",
       "3        DataSet  s3://sagemaker-us-east-1-231218423789/sagemake...   \n",
       "4        DataSet  s3://sagemaker-us-east-1-231218423789/sagemake...   \n",
       "..           ...                                                ...   \n",
       "124      DataSet  s3://sagemaker-us-east-1-231218423789/tensorfl...   \n",
       "125      DataSet  s3://sagemaker-us-east-1-231218423789/tensorfl...   \n",
       "126      DataSet  s3://sagemaker-us-east-1-231218423789/tensorfl...   \n",
       "127        Image  503895931360.dkr.ecr.us-east-1.amazonaws.com/s...   \n",
       "128        Image  763104351884.dkr.ecr.us-east-1.amazonaws.com/t...   \n",
       "\n",
       "                        CreationTime                 LastModifiedTime  \n",
       "0   2020-12-22 17:43:43.494000+00:00 2020-12-22 19:37:07.535000+00:00  \n",
       "1   2020-12-22 17:43:40.963000+00:00 2020-12-22 17:43:40.963000+00:00  \n",
       "2   2020-12-22 17:29:01.770000+00:00 2020-12-22 17:29:01.770000+00:00  \n",
       "3   2020-12-22 17:29:01.720000+00:00 2020-12-22 17:29:01.720000+00:00  \n",
       "4   2020-12-22 17:29:01.656000+00:00 2020-12-22 17:29:01.656000+00:00  \n",
       "..                               ...                              ...  \n",
       "124 2020-12-18 17:32:41.003000+00:00 2020-12-18 17:32:41.003000+00:00  \n",
       "125 2020-12-18 17:32:40.988000+00:00 2020-12-18 17:32:40.988000+00:00  \n",
       "126 2020-12-18 17:32:40.922000+00:00 2020-12-18 17:32:40.922000+00:00  \n",
       "127 2020-12-18 17:32:40.877000+00:00 2020-12-18 17:32:40.877000+00:00  \n",
       "128 2020-12-18 17:32:37.337000+00:00 2020-12-18 17:32:37.337000+00:00  \n",
       "\n",
       "[129 rows x 6 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "# More Analytics\n",
    "\n",
    "from sagemaker.analytics import (\n",
    "    AnalyticsMetricsBase,\n",
    "    HyperparameterTuningJobAnalytics,\n",
    "    TrainingJobAnalytics,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# tuner = HyperparameterTuningJobAnalytics(\"my-tuning-job\", sagemaker_session=session)\n",
    "# df = tuner.dataframe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# trainer = TrainingJobAnalytics(\"my-training-job\", [\"train:acc\"], sagemaker_session=session)\n",
    "# df = trainer.dataframe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Visualize Lineage Graph as DataFrame"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "See: \n",
    "https://github.com/aws/sagemaker-python-sdk/blob/b97e355d6a9a627d26a81b773b414d5ff3b405fa/src/sagemaker/lineage/visualizer.py\n",
    "\n",
    "show(\n",
    "    trial_component_name=None,\n",
    "    training_job_name=None,\n",
    "    processing_job_name=None,\n",
    "    pipeline_execution_step=None,\n",
    "    model_package_arn=None,\n",
    "    endpoint_arn=None,\n",
    "):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Artifacts for Processing Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name/Source</th>\n",
       "      <th>Direction</th>\n",
       "      <th>Type</th>\n",
       "      <th>Association Type</th>\n",
       "      <th>Lineage Type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>preprocess-scikit-text-to-bert.py</td>\n",
       "      <td>Input</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>ContributedTo</td>\n",
       "      <td>artifact</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>s3://...2020-12-29-22-19-21-922/output/bert-test</td>\n",
       "      <td>Output</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>Produced</td>\n",
       "      <td>artifact</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>s3://...2-29-22-19-21-922/output/bert-validation</td>\n",
       "      <td>Output</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>Produced</td>\n",
       "      <td>artifact</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>s3://...020-12-29-22-19-21-922/output/bert-train</td>\n",
       "      <td>Output</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>Produced</td>\n",
       "      <td>artifact</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        Name/Source Direction     Type  \\\n",
       "0                 preprocess-scikit-text-to-bert.py     Input  DataSet   \n",
       "1  s3://...2020-12-29-22-19-21-922/output/bert-test    Output  DataSet   \n",
       "2  s3://...2-29-22-19-21-922/output/bert-validation    Output  DataSet   \n",
       "3  s3://...020-12-29-22-19-21-922/output/bert-train    Output  DataSet   \n",
       "\n",
       "  Association Type Lineage Type  \n",
       "0    ContributedTo     artifact  \n",
       "1         Produced     artifact  \n",
       "2         Produced     artifact  \n",
       "3         Produced     artifact  "
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import time\n",
    "from sagemaker.lineage.visualizer import LineageTableVisualizer\n",
    "\n",
    "viz = LineageTableVisualizer(sagemaker.session.Session())\n",
    "\n",
    "df = viz.show(processing_job_name='pipelines-jvk06bcbvsxx-processing-v21c4wmlos')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Artifacts for Training Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name/Source</th>\n",
       "      <th>Direction</th>\n",
       "      <th>Type</th>\n",
       "      <th>Association Type</th>\n",
       "      <th>Lineage Type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>s3://...2020-12-29-22-19-21-922/output/bert-test</td>\n",
       "      <td>Input</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>ContributedTo</td>\n",
       "      <td>artifact</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>s3://...2-29-22-19-21-922/output/bert-validation</td>\n",
       "      <td>Input</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>ContributedTo</td>\n",
       "      <td>artifact</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>s3://...020-12-29-22-19-21-922/output/bert-train</td>\n",
       "      <td>Input</td>\n",
       "      <td>DataSet</td>\n",
       "      <td>ContributedTo</td>\n",
       "      <td>artifact</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>76310...ws.com/tensorflow-training:2.1.0-cpu-py3</td>\n",
       "      <td>Input</td>\n",
       "      <td>Image</td>\n",
       "      <td>ContributedTo</td>\n",
       "      <td>artifact</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>model.tar.gz</td>\n",
       "      <td>Output</td>\n",
       "      <td>Model</td>\n",
       "      <td>Produced</td>\n",
       "      <td>artifact</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        Name/Source Direction     Type  \\\n",
       "0  s3://...2020-12-29-22-19-21-922/output/bert-test     Input  DataSet   \n",
       "1  s3://...2-29-22-19-21-922/output/bert-validation     Input  DataSet   \n",
       "2  s3://...020-12-29-22-19-21-922/output/bert-train     Input  DataSet   \n",
       "3  76310...ws.com/tensorflow-training:2.1.0-cpu-py3     Input    Image   \n",
       "4                                      model.tar.gz    Output    Model   \n",
       "\n",
       "  Association Type Lineage Type  \n",
       "0    ContributedTo     artifact  \n",
       "1    ContributedTo     artifact  \n",
       "2    ContributedTo     artifact  \n",
       "3    ContributedTo     artifact  \n",
       "4         Produced     artifact  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import time\n",
    "from sagemaker.lineage.visualizer import LineageTableVisualizer\n",
    "\n",
    "viz = LineageTableVisualizer(sagemaker.session.Session())\n",
    "\n",
    "df = viz.show(training_job_name='pipelines-jvk06bcbvsxx-Train-ja5pJvXyWE')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Artifacts for Model Package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "from sagemaker.lineage.visualizer import LineageTableVisualizer\n",
    "\n",
    "viz = LineageTableVisualizer(sagemaker.session.Session())\n",
    "\n",
    "df = viz.show(model_package_arn='arn:aws:sagemaker:us-east-1:231218423789:model-package-group/bert-reviews-16092802964020664')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Artifacts for Trial Component"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"TrialComponentSummaries\": [\n",
      "        {\n",
      "            \"TrialComponentName\": \"tensorflow-training-2020-12-18-18-56-45-812-aws-training-job\",\n",
      "            \"TrialComponentArn\": \"arn:aws:sagemaker:us-east-1:231218423789:experiment-trial-component/tensorflow-training-2020-12-18-18-56-45-812-aws-training-job\",\n",
      "            \"DisplayName\": \"train\",\n",
      "            \"TrialComponentSource\": {\n",
      "                \"SourceArn\": \"arn:aws:sagemaker:us-east-1:231218423789:training-job/tensorflow-training-2020-12-18-18-56-45-812\",\n",
      "                \"SourceType\": \"SageMakerTrainingJob\"\n",
      "            },\n",
      "            \"Status\": {\n",
      "                \"PrimaryStatus\": \"Stopped\",\n",
      "                \"Message\": \"Status: Stopped, secondary status: Stopped, failure reason: .\"\n",
      "            },\n",
      "            \"CreationTime\": 1608317808.144,\n",
      "            \"CreatedBy\": {},\n",
      "            \"LastModifiedTime\": 1608325164.65,\n",
      "            \"LastModifiedBy\": {}\n",
      "        },\n",
      "        {\n",
      "            \"TrialComponentName\": \"TrialComponent-2020-12-18-185428-cgwf\",\n",
      "            \"TrialComponentArn\": \"arn:aws:sagemaker:us-east-1:231218423789:experiment-trial-component/trialcomponent-2020-12-18-185428-cgwf\",\n",
      "            \"DisplayName\": \"prepare\",\n",
      "            \"CreationTime\": 1608317668.922,\n",
      "            \"CreatedBy\": {},\n",
      "            \"LastModifiedTime\": 1608317672.919,\n",
      "            \"LastModifiedBy\": {}\n",
      "        }\n",
      "    ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "!aws sagemaker list-trial-components \\\n",
    "    --experiment-name 'Amazon-Customer-Reviews-BERT-Experiment-1608317665'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name/Source</th>\n",
       "      <th>Direction</th>\n",
       "      <th>Type</th>\n",
       "      <th>Association Type</th>\n",
       "      <th>Lineage Type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [Name/Source, Direction, Type, Association Type, Lineage Type]\n",
       "Index: []"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import time\n",
    "from sagemaker.lineage.visualizer import LineageTableVisualizer\n",
    "\n",
    "viz = LineageTableVisualizer(sagemaker.session.Session())\n",
    "\n",
    "df = viz.show(trial_component_name='tensorflow-training-2020-12-18-18-56-45-812-aws-training-job')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Artifacts for All Pipeline Steps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "from sagemaker.lineage.visualizer import LineageTableVisualizer\n",
    "\n",
    "\n",
    "viz = LineageTableVisualizer(sagemaker.session.Session())\n",
    "for execution_step in reversed(execution.list_steps()):\n",
    "    print(execution_step)\n",
    "    display(viz.show(pipeline_execution_step=execution_step))\n",
    "    time.sleep(5)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleanup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def delete_associations(arn):\n",
    "    # delete incoming associations\n",
    "    incoming_associations = Association.list(destination_arn=arn)\n",
    "    for summary in incoming_associations:\n",
    "        assct = Association(\n",
    "            source_arn=summary.source_arn, \n",
    "            destination_arn=summary.destination_arn,\n",
    "            sagemaker_session=sagemaker_session)\n",
    "        assct.delete()\n",
    "        time.sleep(3)\n",
    "    \n",
    "    # delete outgoing associations\n",
    "    outgoing_associations = Association.list(source_arn=arn)\n",
    "    for summary in outgoing_associations:\n",
    "        assct = Association(\n",
    "            source_arn=summary.source_arn, \n",
    "            destination_arn=summary.destination_arn,\n",
    "            sagemaker_session=sagemaker_session)\n",
    "        assct.delete()\n",
    "        time.sleep(3)        \n",
    "\n",
    "import time\n",
    "\n",
    "def delete_lineage_data():\n",
    "    for summary in Context.list():\n",
    "        print(f'Deleting context {summary.context_name}')\n",
    "        delete_associations(summary.context_arn)\n",
    "        ctx = Context(context_name=summary.context_name, sagemaker_session=sagemaker_session)        \n",
    "        ctx.delete()\n",
    "        time.sleep(3)\n",
    "\n",
    "    for summary in Action.list():\n",
    "        print(f'Deleting action {summary.action_name}')\n",
    "        delete_associations(summary.action_arn)\n",
    "        actn = Action(action_name=summary.action_name, sagemaker_session=sagemaker_session)\n",
    "        actn.delete()\n",
    "        time.sleep(3)        \n",
    "\n",
    "    for summary in Artifact.list():\n",
    "        print(f'Deleting artifact {summary.artifact_arn} {summary.artifact_name}')\n",
    "        delete_associations(summary.artifact_arn)\n",
    "        artfct = Artifact(artifact_arn=summary.artifact_arn, sagemaker_session=sagemaker_session)\n",
    "        artfct.delete()\n",
    "        time.sleep(3)        \n",
    "\n",
    "delete_lineage_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Caveats\n",
    "\n",
    "* Associations cannot be created between two experiment entities. For example between an Experiment and Trial.\n",
    "* Associations can only be created between the following resources: Experiment, Trial, Trial Component, Action, Artifact, or Context.\n",
    "* The maximum number of manually created lineage entities are:\n",
    "  * Artifacts: 6000\n",
    "  * Contexts: 500\n",
    "  * Actions: 3000\n",
    "  * Associations: 6000\n",
    "* There is no limit on the number of lineage entities created automatically by SageMaker."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Contact\n",
    "\n",
    "Submit any questions or issues to https://github.com/aws/sagemaker-experiments/issues or mention @aws/sagemakerexperimentsadmin"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
